Search CORE

4 research outputs found

HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

Author: Chen Yingke
Opara Chidimma
Wei Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based datadriven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and employs Convolutional Neural Networks (CNNs) to learn the semantic dependencies in the textual contents of the HTML. The CNNs learn appropriate feature representations from the HTML document embeddings without extensive manual feature engineering. Furthermore, our proposed approach of the concatenation of the word and character embeddings allows our model to manage new features and ensure easy extrapolation to test data. We conduct comprehensive experiments on a dataset of more than 50,000 HTML documents that provides a distribution of phishing to benign web pages obtainable in the real-world that yields over 93% Accuracy and True Positive Rate. Also, HTMLPhish is a completely language-independent and client-side strategy which can, therefore, conduct web page phishing detection regardless of the textual language

arXiv.org e-Print Archive

Northumbria Research Link

Crossref

Lancaster E-Prints

Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL And HTML Characteristics

Author: Chen Yingke
Opara Chidimma
wei Bo.
Publication venue
Publication date: 05/11/2020
Field of study

Cybercriminals resort to phishing as a simple and cost-effective medium to perpetrate cyber-attacks on today's Internet. Recent studies in phishing detection are increasingly adopting automated feature selection over traditional manually engineered features. This transition is due to the inability of existing traditional methods to extrapolate their learning to new data. To this end, in this paper, we propose WebPhish, a deep learning technique using automatic feature selection extracted from the raw URL and HTML of a web page. This approach is the first of its kind, which uses the concatenation of URL and HTML embedding feature vectors as input into a Convolutional Neural Network model to detect phishing attacks on web pages. Extensive experiments on a real-world dataset yielded an accuracy of 98 percent, outperforming other state-of-the-art techniques. Also, WebPhish is a client-side strategy that is completely language-independent and can conduct lightweight phishing detection regardless of the web page's textual language

arXiv.org e-Print Archive

Teeside University's Research Repository

A novel web page anti-phishing approach using URL cosine similarity and IP address comparison

Author: Ugochi Opara Chidimma
Publication venue: IADIS Press
Publication date: 31/12/2018
Field of study

Teeside University's Research Repository

It’s All Connected: Detecting Phishing Transaction Records on Ethereum Using Link Prediction

Author: Chen Yingke
Opara Chidimma
Wei Bo
Publication venue: Springer
Publication date: 24/05/2024
Field of study

Digital currencies are increasingly being used on platforms for virtual transactions, such as Ethereum, owing to new financial innovations. As these platforms are anonymous and easy to use, they are perfect places for phishing scams to grow. Unlike traditional phishing detection approaches that aim to distinguish phishing websites and emails using their HTML content and URLs, phishing attacks on Ethereum focus on detecting phishing addresses by analyzing the transaction relationships on the virtual transaction platform. This study proposes a link prediction framework for detecting phishing transactions on the Ethereum platform using 12 local network-based features extracted from the Ether receiving and initiating addresses. The framework was trained and tested on over 280,000 verified phishing and legitimate transaction records. Experimental results indicate that the proposed framework with a LightGBM classifier provides a high recall of 89% and an AUC score of 93%

Teeside University's Research Repository